Weighted Kernel Model For Text Categorization
نویسندگان
چکیده
Traditional bag-of-words model and recent wordsequence kernel are two well-known techniques in the field of text categorization. Bag-of-words representation neglects the word order, which could result in less computation accuracy for some types of documents. Word-sequence kernel takes into account word order, but does not include all information of the word frequency. A weighted kernel model that combines these two models was proposed by the authors [1]. This paper is focused on the optimization of the weighting parameters, which are functions of word frequency. Experiments have been conducted with Reuter’s database and show that the new weighted kernel achieves better classification accuracy.
منابع مشابه
Improving the Performance of Text Categorization using N-gram Kernels
Kernel Methods are known for their robustness in handling large feature space and are widely used as an alternative to external feature extraction based methods in tasks such as classification and regression. This work follows the approach of using different string kernels such as n-gram kernels and gappy-n-gram kernels on text classification. It studies how kernel concatenation and feature com...
متن کاملWord Combination Kernel for Text Categorization
We proposed a novel kernel for text categorization. This kernel is an inner product in the feature space generated by all word combinations of specified length. A word combination is a collection of different words co-occurring in the same sentence. The word combination of length k is weighted by the k-th root of the product of the inverse document frequencies (IDF) of its words. A computationa...
متن کاملDomain Kernels for Text Categorization
In this paper we propose and evaluate a technique to perform semi-supervised learning for Text Categorization. In particular we defined a kernel function, namely the Domain Kernel, that allowed us to plug “external knowledge” into the supervised learning process. External knowledge is acquired from unlabeled data in a totally unsupervised way, and it is represented by means of Domain Models. We...
متن کاملKernel-based Text-categorization
This paper presents some techniques in text categorization. New algorithms, in particular a new SVM kernel for text categorization, are developed and compared to usual techniques. This kernel leads to a more natural space for elaborating separations than the euclid-ian space of frequencies or even inverse frequencies, as the distance in this space is the most usual distance between distribution...
متن کاملModeling Category Structures with a Kernel Function
We propose one type of TOP (Tangent vector Of the Posterior log-odds) kernel and apply it to text categorization. In a number of categorization tasks including text categorization, negative examples are usually more common than positive examples and there may be several different types of negative examples. Therefore, we construct a TOP kernel, regarding the probabilistic model of negative exam...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006